Who feeds the world ? And how wealthy are they ?

There are folium maps in this notebook. If they do not display well, please have a look at the .html file from the same repository.

Abstract

Our main goal in this project is to find social and economic world-wide relations of countries based on the state of their agricultural sector, using indicators such as imports, exports, production, self-sufficiency, etc. In order to find such indicators, we would like to use the data from the "Global Food & Agriculture Statistics" datasets. First, we would like to produce a map showing which countries "feed the world" i.e. which countries are net-exporting food. That map would contain a slider to show how this evolved over the past fifty years. Then we would like to show countrywise the level of food self-sufficiency i.e. the way a country does not need to trade with other ones in order to feed its population. We will also compare it to nations' economic development and see if some correlations can be found. We will visualize our best findings with several interactive maps and plots.

Research questions

We would like to work on the following research questions:

  • How does the production and consumption of food look like from a geographical point of view ? Which countries are net food exporters or importers? How did this evolve over the last few decades ?
  • What's the level of self-sufficiency in food production of individual countries and how does this change over time ?
  • Is there a link between the GDP per capita and the agricultural trade balance ? Are countries that are net exporters or importers richer ? Are self-sufficient countries richer ?
  • If we find any relations, are they also still valid if we check for specific crops ? Are there some crops that are mostly produced by richer countries, some that are mostly produced by poorer countries?

External imports:

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import json
import re
import requests
from bs4 import BeautifulSoup
from ipywidgets import interact
from IPython.display import display

Auxiliary function imports:

We have implemented some functions into a dedicated module (file Milestone_2_scripts.py) in order to simplify the code and make this notebook more enjoyable to read.

In [2]:
#from Milestone_2_scripts import *

Setup:

In [3]:
data_folder_path = "./Data/current_FAO/raw_files/"

files = {"Crops production" : "Production_Crops_E_All_Data_(Normalized).csv",
         "Crops trade" : "Trade_Crops_Livestock_E_All_Data_(Normalized).csv", 
         "Consumer price indices" : "ConsumerPriceIndices_E_All_Data_(Normalized).csv",
         "Macroeconomy" : "Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv",
         "Livestock production" : "Production_Livestock_E_All_Data_(Normalized).csv",
         "Live animals trade" : "Trade_LiveAnimals_E_All_Data_(Normalized).csv"
        }
interesting_datasets = files.keys()

1.A. Dataset description

Our main dataset would be a subset of the "Global Food & Agriculture Statistics" that is found in the proposed datasets list. In this dataset, we have seen that we could work with the production as well as import and export quantities per year and per country. We will add information about countries GDP to this database.

1.B. Loading the data set

In [4]:
def load_datasets(datasets) :
    df = {}
    for dataset in datasets :
        file_path = data_folder_path + files[dataset]
        df[dataset] = pd.read_csv(file_path, encoding = "ISO-8859-1")
    return df

We load each interresting dataset in the dictionary df :

In [5]:
df = load_datasets(interesting_datasets)

1.C. Understanding the data set

In this part, we will have a first look of the datasets in order to get a first sense of the data.

In [6]:
def display_df(df, datasets):
    for dataset in datasets :
        display(dataset, df[dataset].sample(5))

In order to see what does the datasets look like, we display a sample of 5 rows for each of them :

In [7]:
display_df(df, interesting_datasets)
'Crops production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
570978 60 El Salvador 329 Cottonseed 5525 Seed 2000 2000 tonnes 7.0 Fc
648413 74 Gabon 1804 Citrus Fruit,Total 5510 Production 2002 2002 tonnes 1027.0 A
1850937 249 Yemen 1817 Cereals (Rice Milled Eqv) 5419 Yield 1982 1982 hg/ha 9614.0 Fc
524188 55 Dominica 1717 Cereals,Total 5510 Production 1966 1966 tonnes 150.0 A
142607 15 Belgium-Luxembourg 56 Maize 5312 Area harvested 1973 1973 ha 4222.0 NaN
'Crops trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
3993124 70 French Polynesia 1962 Oilseed Cake nes 5622 Import Value 2011 2011 1000 US$ 6.0 A
5035104 101 Indonesia 672 Tea, mate extracts 5922 Export Value 1969 1969 1000 US$ 0.0 NaN
8107552 171 Philippines 507 Grapefruit (inc. pomelos) 5910 Export Quantity 1972 1972 tonnes 0.0 NaN
6908462 28 Myanmar 901 Cheese, whole cow milk 5610 Import Quantity 1983 1983 tonnes 0.0 NaN
7477335 158 Niger 1890 Sugar and Honey 5622 Import Value 1967 1967 1000 US$ 1599.0 A
'Consumer price indices'
Area Code Area Item Code Item Months Code Months Year Code Year Unit Value Flag Note
4680 255 Belgium 23012 Consumer Prices, General Indices (2010 = 100) 7005 May 2006 2006 NaN 92.162773 X 2010
16560 60 El Salvador 23012 Consumer Prices, General Indices (2010 = 100) 7005 May 2000 2000 NaN 71.119831 X 2010
16103 58 Ecuador 23012 Consumer Prices, General Indices (2010 = 100) 7011 November 2003 2003 NaN 77.343872 X 2010
783 4 Algeria 23013 Consumer Prices, Food Indices (2010 = 100) 7010 October 2015 2015 NaN 194.080000 X 2001
37704 150 Netherlands 23012 Consumer Prices, General Indices (2010 = 100) 7003 March 2006 2006 NaN 93.550301 X 2010
'Macroeconomy'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
201973 175 Guinea-Bissau 22008 Gross Domestic Product 6109 Value Local Currency 2011 2011 millions 518288.000000 XAM
9884 6 Andorra 22078 Value Added (Manufacture of tobacco products) 6110 Value US$ 1974 1974 millions NaN NaN
434883 201 Somalia 22016 Value Added (Agriculture, Forestry and Fishing) 6117 Share of GDP in US$, 2005 prices 1996 1996 % 56.734328 Fc
425091 200 Singapore 22077 Value Added (Manufacture of food and beverages) 6110 Value US$ 1996 1996 millions NaN NaN
40169 14 Barbados 22015 Gross Fixed Capital Formation 6158 Share of GDP in Local Currency, 2005 prices 1999 1999 % 20.382349 Fc
'Livestock production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
149116 5402 Northern Europe 1072 Geese and guinea fowls 5112 Stocks 2003 2003 1000 Head 263.0 A
1047 3 Albania 1096 Horses 5111 Stocks 1982 1982 Head 50000.0 F
25019 96 China, Hong Kong SAR 2029 Poultry Birds 5112 Stocks 2006 2006 1000 Head 8900.0 A
135275 5104 Southern Africa 1110 Mules 5111 Stocks 1984 1984 Head 24693.0 A
6804 52 Azerbaijan 1057 Chickens 5112 Stocks 2004 2004 1000 Head 16900.0 *
'Live animals trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
521406 268 EU(25)ex.int 1079 Turkeys 5908 Export Quantity 1995 1995 Head 1133.0 A
481521 225 United Arab Emirates 1921 Bovine, Animals 5622 Import Value 1972 1972 1000 US$ 843.0 A
483256 229 United Kingdom 866 Cattle 5622 Import Value 2011 2011 1000 US$ 52826.0 NaN
377046 117 Republic of Korea 1068 Ducks 5922 Export Value 1991 1991 1000 US$ 2.0 NaN
65017 26 Brunei Darussalam 1169 Animals, live, non-food 5622 Import Value 2001 2001 1000 US$ NaN M

At first glance, our datasets seem very clean.

Each of our dataset contains a column "Year" and a column that is either named "Area" or "Country". This is a great news for us since we want to do a both geographical and time-related analysis.

The columns "Area" and "Country" both correspond to the country except that the "Area" may contains a group of country (e.g. "Eastern Europe").

1.D. Cleansing the data set

In this part, we will clean the datasets. The final goal is to produce one uniformized and normalized dataset on which we could work (see 1.F).

Such a cleaned dataset may look like this (in a very simplistic way):

Country | Year | GDP | Crops production | Livestock production

1.D.a. Removing unuseful data

In this section, we will create dataframes in df_useful which correspond to previous dataframes without the unuseful data.

In [8]:
df_useful = {}
1.D.a.i. Extracting GDP from the "Macroeconomy" dataset
In [9]:
def extract_GDP(df):
    def selection_GDP(df):
        return df['Item']=='Gross Domestic Product'
    def selection_US_dollars(df):
        return df['Element']=="Value US$"
    def drop_columns(df):
        dropped_colmuns = ["Item Code", "Item", "Element Code", "Element", "Flag", "Year Code", "Unit"]
        return df.drop(columns = dropped_colmuns)
    return drop_columns(df[selection_GDP(df)&selection_US_dollars(df)])
In [10]:
df_useful["GDP"] = extract_GDP(df["Macroeconomy"])
In [11]:
display(df_useful["GDP"].sample(5))
Area Code Area Year Value
541484 5204 Central America 1975 1.215663e+05
563470 5706 European Union 2000 8.891990e+06
263478 122 Lesotho 1983 3.554898e+02
146342 56 Dominican Republic 1977 5.740776e+03
393191 184 Rwanda 1972 3.081357e+02
1.D.a.ii. Extracting crops harvested area, production, seed and yield from the "Crops production" dataset
In [12]:
def get_food_crops():
    #Return a list of crops categorized as food crops https://world-crops.com/food-crops/
    url="https://world-crops.com/food-crops/"
    r=requests.get(url,headers={"User-Agent": "XY"})
    soup=BeautifulSoup(r.text,'html.parser')
    elements_temp=soup.find_all('a',href=re.compile("^../"))
    elements=[el.text for el in elements_temp]
    
    #only 40 elements are displayed on each page->iterating on the total list
    for i in range(40,401,40):
        url_i=url+"?ss="+str(i)
        r=requests.get(url_i,headers={"User-Agent":"XY"})
        soup=BeautifulSoup(r.text,'html.parser')
        new_elements=soup.find_all('a',href=re.compile("^../"))
        elements+=[el.text for el in new_elements]
    return elements

def inclusive_search(string,elements):
    #returns true if the string can be found in elements. The search removes special characters from string in order to include more positive results
    string=string.lower()
    delimiters = ",", "(","&",")"," and "," "
    pattern = '|'.join(map(re.escape, delimiters))
    strings=list(filter(None,re.split(pattern,string)))
    found=False
    for s in strings:
        if s=="nes":
            continue
        for el in elements:
            found=(s in el.split())
            if found==False and s[-1]=="s":
                found=s[:-1] in el.split()
            if found==False and s[-2:]=="es":
                found=s[:-2] in el.split()
            if found==False and s[-3:]=="ies":
                found=s[:-3]+"y" in el.split()
            if found==True:
                return found
    return found


def get_food_crop_data(df):    
    #extracts the food crop data, returns 4 df: Area,Production,Seed and yield    
    df=df.copy()
    food_crops=list(map(lambda x: x.lower(),get_food_crops()))              
    crop_types_df=df[['Item','Value']].groupby('Item').sum()
    crop_types_df=crop_types_df[list(map(lambda x : inclusive_search(x,food_crops) , crop_types_df.index ))]   
    food_crop_df=df[df.Item.apply(lambda x: x in crop_types_df.index)]
    return (food_crop_df[food_crop_df.Element=='Area harvested'],
            food_crop_df[food_crop_df.Element=='Production'],
            food_crop_df[food_crop_df.Element=='Seed'],
            food_crop_df[food_crop_df.Element=='Yield'])
  
food_crop_area_df , food_crop_production_df , food_crop_seed_df , food_crop_yield_df = get_food_crop_data(df["Crops production"])
In [13]:
df_useful['Crops Area harvested'] = food_crop_area_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Production'] = food_crop_production_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Seed'] = food_crop_seed_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Yield'] =  food_crop_yield_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [14]:
display(df_useful['Crops Area harvested'].sample(5))
display(df_useful['Crops Production'].sample(5))
display(df_useful['Crops Seed'].sample(5))
display(df_useful['Crops Yield'].sample(5))
Area Code Area Item Year Unit Value
1831317 237 Viet Nam Coconuts 1971 ha 33600.0
120148 16 Bangladesh Oranges 1983 ha 2566.0
291907 36 Cayman Islands Bananas 1977 ha 2.0
11536 3 Albania Grapes 1991 ha 13684.0
1461518 244 Samoa Bananas 1962 ha 10117.0
Area Code Area Item Year Unit Value
1156010 28 Myanmar Beans, dry 1992 tonnes 484800.0
1286412 166 Panama Coffee, green 1962 tonnes 4400.0
1451400 189 Saint Lucia Tomatoes 2014 tonnes 289.0
2397986 5500 Oceania Vetches 1973 tonnes 1800.0
141253 15 Belgium-Luxembourg Cherries, sour 1976 tonnes NaN
Area Code Area Item Year Unit Value
1732301 226 Uganda Cereals (Rice Milled Eqv) 1978 tonnes 23035.0
1216671 157 Nicaragua Rice, paddy 1989 tonnes 4133.0
1083030 136 Mauritania Groundnuts, with shell 1992 tonnes 161.0
85723 10 Australia Roots and Tubers,Total 2009 tonnes 90000.0
2426239 5502 Melanesia Sorghum 1986 tonnes 9.0
Area Code Area Item Year Unit Value
1215053 157 Nicaragua Groundnuts, with shell 2010 hg/ha 55356.0
1091615 137 Mauritius Potatoes 2014 hg/ha 236346.0
981754 121 Lebanon Chillies and peppers, green 1997 hg/ha 176000.0
2583660 5817 Net Food Importing Developing Countries Onions, dry 1989 hg/ha 111357.0
1815081 155 Vanuatu Maize 2013 hg/ha 6333.0
1.D.a.iii. Extracting stocks production from the "Livestock production" dataset
In [15]:
selection_stocks = df['Livestock production']["Element"] == 'Stocks'

df_useful['Livestock production'] = df['Livestock production'][selection_stocks].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [16]:
display(df_useful['Livestock production'].sample(5))
Area Code Area Item Year Unit Value
140448 5207 South America Cattle 1973 Head 198230140.0
100695 244 Samoa Cattle 1994 Head 26000.0
19428 115 Cambodia Pigs 1995 Head 2043900.0
130117 5000 World Ducks 1981 1000 Head 360244.0
19526 115 Cambodia Poultry Birds 1985 1000 Head 8898.0
1.D.a.iv. Extracting import and export quantities from the "Live animals trade" and "Crops trade" datasets
In [17]:
selection_import_quantities = df['Live animals trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Live animals trade']["Element"] == 'Export Quantity'

df_useful['Live animals import quantities'] = df['Live animals trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Live animals export quantities'] = df['Live animals trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [18]:
display(df_useful['Live animals import quantities'].sample(5))
Area Code Area Item Year Unit Value
599946 5400 Europe Pigeons, other birds 1966 1000 Head 87.0
608232 5401 Eastern Europe Bovine, Animals 1978 Head 76803.0
498500 155 Vanuatu Cattle 1982 Head NaN
356773 169 Paraguay Horses 1964 Head 0.0
456579 154 The former Yugoslav Republic of Macedonia Horses 1997 Head 0.0
In [19]:
display(df_useful['Live animals export quantities'].sample(5))
Area Code Area Item Year Unit Value
589576 5303 Southern Asia Mules 1984 Head NaN
657591 5815 Low Income Food Deficit Countries Turkeys 2000 1000 Head 4.0
63848 239 British Virgin Islands Sheep 1998 Head NaN
107225 41 China, mainland Ducks 1968 1000 Head 0.0
206148 91 Guyana Chickens 1978 1000 Head 0.0
In [20]:
selection_import_quantities = df['Crops trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Crops trade']["Element"] == 'Export Quantity'

df_useful['Crops import quantities'] = df['Crops trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops export quantities'] = df['Crops trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [21]:
display(df_useful['Crops import quantities'].sample(5))
Area Code Area Item Year Unit Value
13184504 5305 Western Asia Hides, buffalo, wet salted 1980 tonnes 0.0
5581936 112 Jordan Malt 2000 tonnes 901.0
7480931 159 Nigeria Bread 2012 tonnes 39.0
5358570 106 Italy Meat, chicken 1976 tonnes 8634.0
7059873 149 Nepal Melons, other (inc.cantaloupes) 1973 tonnes NaN
In [22]:
display(df_useful['Crops export quantities'].sample(5))
Area Code Area Item Year Unit Value
7225037 153 New Caledonia Juice, grape 2002 tonnes 0.0
372083 22 Aruba Coconuts, desiccated 2013 tonnes 0.0
8378817 179 Qatar Tea, mate extracts 1983 tonnes NaN
2115428 40 Chile Vegetables, preserved nes 1983 tonnes 90.0
3399267 58 Ecuador Tomatoes 1974 tonnes 0.0
1.D.a.v. Extracting average CPI of each year from the "Consumer price indices" dataset
In [23]:
df_useful['Consumer price indices'] =  df['Consumer price indices'][['Area',"Year",'Value']] \
                                        .dropna() \
                                        .groupby(['Area',"Year"]) \
                                        .mean() \
                                        .reset_index() \
                                        .dropna()
In [24]:
display(df_useful['Consumer price indices'].sample(5))
Area Year Value
689 Colombia 2002 64.158531
2281 Palau 2007 88.750000
3035 Timor-Leste 2004 73.593426
2342 Paraguay 2016 142.769098
2305 Panama 2015 110.468197
1.D.a.vi. Removing areas which are not countries

Having a more detailled look at the dataset, we have remarked that the areas which are real countries are exactely the ones with an "Area Code" below 5000.

In [25]:
#remove Area code >= 5000
for df_name in df_useful :
    if 'Area Code' in df_useful[df_name].keys() : 
        print ("Removing areas which are not countries in", df_name)
        selection_countries = df_useful[df_name]['Area Code']<5000
        df_useful[df_name] = df_useful[df_name][selection_countries]
        display(df_useful[df_name].sample(5))
    else :
        print (df_name, "is already clean")
Removing areas which are not countries in GDP
Area Code Area Year Value
244993 114 Kenya 2008 35896.878739
285873 132 Maldives 2005 1119.806502
46643 23 Belize 2013 1624.294261
159101 178 Eritrea 2013 3502.366233
39505 14 Barbados 1975 476.013920
Removing areas which are not countries in Crops Area harvested
Area Code Area Item Year Unit Value
1726333 226 Uganda Chillies and peppers, dry 1996 ha 3800.0
70718 10 Australia Beans, dry 1974 ha 3230.0
764689 93 Haiti Roots and Tubers,Total 2000 ha 197515.0
760174 93 Haiti Potatoes 1995 ha 542.0
1534308 202 South Africa Oats 1995 ha 28000.0
Removing areas which are not countries in Crops Production
Area Code Area Item Year Unit Value
526863 56 Dominican Republic Carrots and turnips 1978 tonnes 510.0
1825430 236 Venezuela (Bolivarian Republic of) Sugar cane 1964 tonnes 3227409.0
1544790 203 Spain Beans, dry 1982 tonnes 73300.0
1004661 124 Libya Garlic 1978 tonnes 350.0
436931 107 Côte d'Ivoire Papayas 1986 tonnes NaN
Removing areas which are not countries in Crops Seed
Area Code Area Item Year Unit Value
1767347 215 United Republic of Tanzania Yams 1965 tonnes 440.0
1438930 184 Rwanda Soybeans 2013 tonnes 943.0
914235 110 Japan Sweet potatoes 1992 tonnes 46000.0
541890 58 Ecuador Groundnuts, with shell 1992 tonnes 638.0
1393394 117 Republic of Korea Rye 2006 tonnes 0.0
Removing areas which are not countries in Crops Yield
Area Code Area Item Year Unit Value
968330 120 Lao People's Democratic Republic Grapefruit (inc. pomelos) 1971 hg/ha 87500.0
640599 70 French Polynesia Melons, other (inc.cantaloupes) 1996 hg/ha 103742.0
1765125 215 United Republic of Tanzania Sorghum 2002 hg/ha 9700.0
1658864 176 Timor-Leste Beans, green 2004 hg/ha 8830.0
875995 106 Italy Broad beans, horse beans, dry 1986 hg/ha 13553.0
Removing areas which are not countries in Livestock production
Area Code Area Item Year Unit Value
81778 151 Netherlands Antilles (former) Goats 1985 Head 16000.0
128370 248 Yugoslav SFR Poultry Birds 1970 1000 Head 40855.0
50791 89 Guatemala Sheep and Goats 1989 Head 520000.0
1396 3 Albania Poultry Birds 2007 1000 Head 7112.0
123078 240 United States Virgin Islands Poultry Birds 2011 1000 Head 40.0
Removing areas which are not countries in Live animals import quantities
Area Code Area Item Year Unit Value
31621 13 Bahrain Goats 1994 Head 77508.0
285853 133 Mali Asses 2011 Head 0.0
488877 215 United Republic of Tanzania Sheep and Goats 1961 Head 0.0
151820 58 Ecuador Rabbits and hares 1991 1000 Head NaN
361920 171 Philippines Beehives 1970 No NaN
Removing areas which are not countries in Live animals export quantities
Area Code Area Item Year Unit Value
465330 222 Tunisia Animals live nes 1975 Head NaN
71242 27 Bulgaria Bovine, Animals 1972 Head 32735.0
263318 121 Lebanon Animals live nes 2003 Head 0.0
413803 200 Singapore Rabbits and hares 1991 1000 Head NaN
137397 250 Democratic Republic of the Congo Cattle 1984 Head 0.0
Removing areas which are not countries in Crops import quantities
Area Code Area Item Year Unit Value
10978312 231 United States of America Jute+Bast Fibres 1984 tonnes 10090.0
1547175 27 Bulgaria Flour, pulses 1996 tonnes 0.0
1254080 19 Bolivia (Plurinational State of) Meat Bovine Fresh 2002 tonnes 45.0
8895684 193 Sao Tome and Principe Fruit, dried nes 1964 tonnes 1.0
8863802 244 Samoa Nuts, nes 1988 tonnes 3.0
Removing areas which are not countries in Crops export quantities
Area Code Area Item Year Unit Value
2225202 96 China, Hong Kong SAR Flour, potatoes 1994 tonnes 3448.0
759033 16 Bangladesh Fructose and syrup, other 1997 tonnes NaN
6109003 126 Lithuania Spices, nes 1998 tonnes 201.0
6442741 133 Mali Pulses 2009 tonnes 0.0
8965613 194 Saudi Arabia Pastry 1986 tonnes 886.0
Consumer price indices is already clean

1.D.b. Handling of the missing data

In this section, we will explain how we will handle the missing data in previous dataframes for maps.

1.D.b.i. Highlighting the problem
In [26]:
df_useful["GDP"][df_useful["GDP"]['Area']=='USSR'].plot(x ='Year', y='Value', kind = 'line')
df_useful["GDP"][df_useful["GDP"]['Area']=='Ukraine'].plot(x ='Year', y='Value', kind = 'line')
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x1b30545c848>
1.D.b.ii. Proposed correction
In [27]:
countries_formation_years = {}
for country in df_useful["GDP"]["Area"].unique():
    selection = df_useful["GDP"]["Area"] == country
    year_in, year_out = df_useful["GDP"][selection].dropna()["Year"].min(), df_useful["GDP"][selection].dropna()["Year"].max()
    for year in (year_in, year_out):
        if year not in countries_formation_years :
            countries_formation_years[year] = []
    countries_formation_years[year_in].append((country,'+'))
    countries_formation_years[year_out].append((country,'-'))

countries_formation_years.pop(1970)
countries_formation_years.pop(2015)
print (countries_formation_years)
{1990: [('Armenia', '+'), ('Azerbaijan', '+'), ('Belarus', '+'), ('Bosnia and Herzegovina', '+'), ('Croatia', '+'), ('Czechia', '+'), ('Eritrea', '+'), ('Estonia', '+'), ('Ethiopia', '+'), ('Georgia', '+'), ('Kazakhstan', '+'), ('Kyrgyzstan', '+'), ('Latvia', '+'), ('Lithuania', '+'), ('Montenegro', '+'), ('Republic of Moldova', '+'), ('Russian Federation', '+'), ('Serbia', '+'), ('Slovakia', '+'), ('Slovenia', '+'), ('Tajikistan', '+'), ('The former Yugoslav Republic of Macedonia', '+'), ('Timor-Leste', '+'), ('Turkmenistan', '+'), ('Ukraine', '+'), ('Uzbekistan', '+')], 2005: [('Curaçao', '+'), ('Sint Maarten (Dutch Part)', '+')], 1989: [('Czechoslovakia', '-'), ('Ethiopia PDR', '-'), ('USSR', '-'), ('Yemen', '+'), ('Yugoslav SFR', '-')], 1999: [('Kosovo', '+')], 2012: [('Netherlands Antilles (former)', '-')], 2008: [('South Sudan', '+'), ('Sudan', '+')], 2007: [('Sudan (former)', '-')], 1988: [('Yemen Ar Rp', '-'), ('Yemen Dem', '-')]}

1.E. Preprocessing the data set

In this part, we will finish prepocessing the datasets. More precisely, we will deal with country names and normalizing the features.

1.E.a. Converting country names between different naming conventions

In [28]:
dic = {'Czechia': "Czech Republic",
       'Russian Federation':'Russia',
       "Serbia":"Republic of Serbia",
       'The former Yugoslav Republic of Macedonia':'Macedonia',
       'China, mainland':'China',
       'Viet Nam':'Vietnam',
       'Venezuela (Bolivarian Republic of)':'Venezuela',
       'Iran (Islamic Republic of)':'Iran',
       'Syrian Arab Republic':"Syria",
       'Bolivia (Plurinational State of)': 'Bolivia',
       "Côte d'Ivoire": "Ivory Coast",
       'Congo':"Republic of the Congo",
       "Lao People's Democratic Republic":'Laos',
       "Democratic People's Republic of Korea":"North Korea",
       'Republic of Korea':"South Korea"}

def correct_country_names(old_name):
    if old_name in dic.keys() :
        return dic[old_name]
    return old_name
In [29]:
for df_name in df_useful :
    print (df_name)
    df_useful[df_name]["Area"] = df_useful[df_name]["Area"].apply(correct_country_names)
GDP
Crops Area harvested
Crops Production
Crops Seed
Crops Yield
Livestock production
Live animals import quantities
Live animals export quantities
Crops import quantities
Crops export quantities
Consumer price indices
In [30]:
def visualise_world_data_folium(df, year, logScale=True):
    dic = {'USSR':                            ['Armenia', 'Azerbaijan','Belarus', 'Estonia', 'Georgia',
                                               'Kazakhstan', 'Kyrgyzstan', 'Latvia', 'Lithuania',
                                               'Montenegro', 'Republic of Moldova', 'Russia',
                                               'Republic of Serbia', 'Timor-Leste', 'Turkmenistan', 'Ukraine',
                                               'Uzbekistan'],
           'Ethiopia PDR':                     ['Eritrea','Ethiopia'],
           'Yugoslav SFR':                     ['Kosovo', 'Slovenia', 'Croatia',
                                                'Macedonia', 'Bosnia and Herzegovina'],
           'Yemen Dem' :                       ['Yemen'],        
           'Czechoslovakia':                   ["Czech Republic", 'Slovakia'],
           'Netherlands Antilles (former)':    ['Curaçao', 'Sint Maarten (Dutch Part)'],
           'Sudan (former)':                   ['South Sudan', 'Sudan']
          }
    def add_new_names(old_name):
        if old_name in dic.keys() :
            return dic[old_name]
        return old_name
    to_plot=df[df["Year"]==year]
    to_plot=(to_plot[['Area','Value']]
             .dropna()
             .groupby('Area')             
             .mean()
             .reset_index()
             .dropna())    
    to_plot['Area']=to_plot['Area'].apply(add_new_names)
    to_plot = to_plot.explode('Area')
    if logScale :
        to_plot.Value=np.log10(to_plot.Value)
    
    m = folium.Map(location=[40,-10],zoom_start=1.6)
    folium.Choropleth(
        geo_data=f"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json",
        data=to_plot,
        columns=['Area', 'Value'],
        key_on='feature.properties.name',
        fill_color='YlGn',fill_opacity=0.7,line_opacity=0.2,nan_fill_opacity=0.0
    ).add_to(m)

    folium.LayerControl().add_to(m)

    return(m)
In [31]:
display(visualise_world_data_folium(df_useful["GDP"], 1985, True))

1.E.b. Normalization and log scales

TODO, explain why (heavy tail, right skewed, power laws) + do it

For instance the distribution of GDP look a bit like a power law.

In [32]:
sns.distplot(df_useful["GDP"]["Value"], rug=False, hist=False)
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x1b30eb78708>
In [33]:
#looks better with log scale
sns.distplot(np.log(df_useful["GDP"]["Value"]), rug=False, hist=False)
Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x1b305ca6548>

1.F. Making one uniformized dataframe

In this part, we will make one uniformized dataframe uni_df with the following columns.

Country | Year | GDP | Crops production columns | Livestock production columns | Crops importation columns | Livestock importation columns | Crops exportation columns | Livestock exportation | CPI

In this uniformized dataframe, a tuple (Country, Year) uniquely identifies a row.

1.F.a. Pivoting dataframes with items

In [34]:
need_pivot = ['Crops Area harvested',
              'Crops Production',
              'Crops Seed',
              'Crops Yield',
              'Livestock production',
              'Live animals import quantities',
              'Live animals export quantities',
              'Crops import quantities',
              'Crops export quantities']

def rename_columns(x, word):
    if x not in ['Area', 'Year', 'ha', 'tonnes', 'hg/ha', 'Head', '1000 Head']:
        return x + ' ' + word
    return x

df_useful['GDP'] = df_useful['GDP'].rename(columns = {'Value':'(GDP, million $)'})[["Area",'Year','(GDP, million $)']]
df_useful['Consumer price indices'] = df_useful['Consumer price indices'].rename(columns = {'Value':'(Consumer price indices, %)'})[["Area",'Year','(Consumer price indices, %)']]

for df_name in need_pivot :
    df_useful[df_name] = pd.pivot_table(df_useful[df_name], index=["Area",'Year'], columns=["Item","Unit"], values="Value").rename(columns=lambda x: rename_columns(x, df_name))
    display(df_useful[df_name].sample(5))
Item Anise, badian, fennel, coriander Crops Area harvested Apples Crops Area harvested Apricots Crops Area harvested Areca nuts Crops Area harvested Artichokes Crops Area harvested Asparagus Crops Area harvested Avocados Crops Area harvested Bambara beans Crops Area harvested Bananas Crops Area harvested Barley Crops Area harvested ... Sweet potatoes Crops Area harvested Tangerines, mandarins, clementines, satsumas Crops Area harvested Taro (cocoyam) Crops Area harvested Tomatoes Crops Area harvested Tung nuts Crops Area harvested Vegetables&Melons, Total Crops Area harvested Vetches Crops Area harvested Watermelons Crops Area harvested Wheat Crops Area harvested Yams Crops Area harvested
Unit ha ha ha ha ha ha ha ha ha ha ... ha ha ha ha ha ha ha ha ha ha
Area Year
Netherlands 1994 300.0 16500.0 NaN NaN NaN 2400.0 NaN NaN NaN 43700.0 ... NaN NaN NaN 1200.0 NaN 73430.0 NaN NaN 121600.0 NaN
Kuwait 1969 NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 ... NaN NaN NaN 188.0 NaN 558.0 NaN NaN NaN NaN
Gabon 1964 NaN NaN NaN NaN NaN NaN NaN NaN 1300.0 NaN ... 1000.0 NaN 2000.0 NaN NaN 4800.0 NaN NaN NaN 7500.0
Niger 2013 NaN NaN NaN NaN NaN NaN NaN 68301.0 NaN NaN ... 4621.0 NaN NaN 10100.0 NaN 79239.0 NaN 1930.0 1883.0 NaN
Jordan 1970 NaN 400.0 NaN NaN NaN NaN NaN NaN 1040.0 40848.0 ... NaN NaN NaN 13183.0 NaN 26774.0 8285.0 4880.0 222839.0 NaN

5 rows × 120 columns

Item Anise, badian, fennel, coriander Crops Production Apples Crops Production Apricots Crops Production Areca nuts Crops Production Artichokes Crops Production Asparagus Crops Production Avocados Crops Production Bambara beans Crops Production Bananas Crops Production Barley Crops Production ... Sweet potatoes Crops Production Tangerines, mandarins, clementines, satsumas Crops Production Taro (cocoyam) Crops Production Tomatoes Crops Production Tung nuts Crops Production Vegetables&Melons, Total Crops Production Vetches Crops Production Watermelons Crops Production Wheat Crops Production Yams Crops Production
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Brunei Darussalam 2003 NaN NaN NaN NaN NaN NaN NaN NaN 792.0 NaN ... 229.0 NaN NaN 104.0 NaN 14606.0 NaN 114.0 NaN NaN
Lesotho 1974 NaN NaN NaN NaN NaN NaN NaN NaN NaN 350.0 ... NaN NaN NaN NaN NaN 18000.0 NaN NaN 57000.0 NaN
Réunion 1978 NaN NaN NaN NaN NaN NaN 500.0 NaN 4240.0 NaN ... 4000.0 NaN NaN 3000.0 NaN 14190.0 NaN NaN NaN NaN
United States of America 1974 NaN 2984486.0 84867.0 NaN 31842.0 118115.0 67374.0 NaN 2994.0 6502621.0 ... 605417.0 558823.0 4066.0 7279122.0 NaN 23593121.0 NaN 1044799.0 48496688.0 NaN
Egypt 1966 1000.0 6145.0 10034.0 NaN 16155.0 NaN NaN NaN 85413.0 102070.0 ... 83050.0 78761.0 35000.0 1365959.0 NaN 5282934.0 1000.0 1134348.0 1465000.0 NaN

5 rows × 122 columns

Item Anise, badian, fennel, coriander Crops Seed Bambara beans Crops Seed Bananas Crops Seed Barley Crops Seed Beans, dry Crops Seed Broad beans, horse beans, dry Crops Seed Buckwheat Crops Seed Cabbages and other brassicas Crops Seed Carrots and turnips Crops Seed Cassava Crops Seed ... Sorghum Crops Seed Soybeans Crops Seed Sugar cane Crops Seed Sweet potatoes Crops Seed Taro (cocoyam) Crops Seed Vegetables&Melons, Total Crops Seed Vetches Crops Seed Watermelons Crops Seed Wheat Crops Seed Yams Crops Seed
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Samoa 2011 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN 2.0 NaN NaN NaN NaN
Trinidad and Tobago 2005 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Cambodia 1997 NaN NaN NaN NaN 1007.0 NaN NaN NaN NaN NaN ... NaN 1239.0 12000.0 NaN NaN NaN NaN NaN NaN NaN
Nicaragua 1965 NaN NaN NaN NaN 2264.0 NaN NaN NaN NaN NaN ... 758.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN
Cabo Verde 1967 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 52 columns

Item Anise, badian, fennel, coriander Crops Yield Apples Crops Yield Apricots Crops Yield Areca nuts Crops Yield Artichokes Crops Yield Asparagus Crops Yield Avocados Crops Yield Bambara beans Crops Yield Bananas Crops Yield Barley Crops Yield ... Sweet potatoes Crops Yield Tangerines, mandarins, clementines, satsumas Crops Yield Taro (cocoyam) Crops Yield Tomatoes Crops Yield Tung nuts Crops Yield Vegetables&Melons, Total Crops Yield Vetches Crops Yield Watermelons Crops Yield Wheat Crops Yield Yams Crops Yield
Unit hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha ... hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha
Area Year
Bolivia 1961 NaN 59259.0 NaN NaN NaN NaN 50000.0 NaN 238095.0 6582.0 ... 102000.0 73684.0 NaN 120000.0 NaN 31580.0 NaN NaN 5625.0 NaN
Honduras 1999 NaN 42857.0 NaN NaN NaN NaN 32000.0 NaN 201657.0 NaN ... 63542.0 NaN NaN 100535.0 NaN 138301.0 NaN 193738.0 4750.0 NaN
Dominican Republic 1982 NaN NaN NaN NaN NaN NaN NaN NaN 152381.0 NaN ... 49974.0 NaN NaN 218729.0 NaN 130925.0 NaN NaN NaN 58391.0
Sri Lanka 1981 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 95141.0 NaN NaN 51580.0 NaN 45522.0 NaN NaN NaN NaN
Mozambique 2006 NaN NaN NaN NaN NaN NaN NaN NaN 87500.0 NaN ... 84628.0 37647.0 NaN 101818.0 NaN 52253.0 NaN NaN 12353.0 NaN

5 rows × 120 columns

Item Animals live nes Livestock production Asses Livestock production Beehives Livestock production Buffaloes Livestock production Camelids, other Livestock production Camels Livestock production Cattle Livestock production Cattle and Buffaloes Livestock production Chickens Livestock production Ducks Livestock production ... Horses Livestock production Mules Livestock production Pigeons, other birds Livestock production Pigs Livestock production Poultry Birds Livestock production Rabbits and hares Livestock production Rodents, other Livestock production Sheep Livestock production Sheep and Goats Livestock production Turkeys Livestock production
Unit Head Head No Livestock production Head Head Head Head Head 1000 Head 1000 Head ... Head Head 1000 Head Head 1000 Head 1000 Head 1000 Head Head Head 1000 Head
Area Year
Saint Lucia 1973 NaN 500.0 NaN NaN NaN NaN 8200.0 8200.0 112.0 NaN ... 1000.0 1000.0 NaN 7850.0 112.0 NaN NaN 10442.0 18497.0 NaN
Yemen 1972 NaN 720000.0 40700.0 NaN NaN 230000.0 980000.0 980000.0 2498.0 NaN ... 4000.0 NaN NaN NaN 2498.0 NaN NaN 2467000.0 5021000.0 NaN
Netherlands 1970 NaN NaN 60000.0 NaN NaN NaN 3679000.0 3679000.0 56209.0 968.0 ... 98897.0 NaN NaN 5650360.0 58173.0 NaN NaN 610272.0 623678.0 996.0
Greece 1984 NaN 198939.0 1197501.0 825.0 NaN NaN 754751.0 755576.0 29082.0 105.0 ... 79999.0 94084.0 1401.0 1064803.0 30789.0 1776.0 NaN 8252435.0 12995983.0 158.0
Brunei Darussalam 1983 NaN NaN NaN 11000.0 NaN NaN 2000.0 13000.0 2033.0 32.0 ... NaN NaN NaN 13000.0 2065.0 NaN NaN NaN 1000.0 NaN

5 rows × 22 columns

Item Animals live nes Live animals import quantities Asses Live animals import quantities Beehives Live animals import quantities Bovine, Animals Live animals import quantities Buffaloes Live animals import quantities Camelids, other Live animals import quantities Camels Live animals import quantities Cattle Live animals import quantities Chickens Live animals import quantities ... Mules Live animals import quantities Pigeons, other birds Live animals import quantities Pigs Live animals import quantities Rabbits and hares Live animals import quantities Rodents, other Live animals import quantities Sheep Live animals import quantities Sheep and Goats Live animals import quantities Turkeys Live animals import quantities
Unit Head Head No Live animals import quantities Head Head Head Head Head 1000 Head Head ... Head 1000 Head Head Head 1000 Head 1000 Head Head Head 1000 Head Head
Area Year
Cabo Verde 1970 NaN NaN NaN 0.0 NaN NaN NaN 0.0 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
China, Macao SAR 2005 0.0 0.0 NaN 5746.0 0.0 NaN NaN 5746.0 3047.0 NaN ... NaN 0.0 NaN 120869.0 NaN NaN 0.0 1400.0 NaN NaN
Togo 1997 0.0 0.0 NaN 11404.0 NaN NaN NaN 11404.0 39.0 NaN ... NaN NaN NaN 13.0 NaN NaN 5058.0 16736.0 0.0 NaN
Egypt 2001 0.0 0.0 0.0 239248.0 0.0 NaN 99651.0 239248.0 2583.0 NaN ... NaN NaN NaN NaN 0.0 NaN 147542.0 147542.0 100.0 NaN
Syria 1970 NaN NaN NaN 23761.0 NaN NaN 0.0 23761.0 0.0 NaN ... 0.0 NaN NaN 0.0 NaN NaN 221000.0 221000.0 NaN NaN

5 rows × 24 columns

Item Animals live nes Live animals export quantities Asses Live animals export quantities Beehives Live animals export quantities Bovine, Animals Live animals export quantities Buffaloes Live animals export quantities Camelids, other Live animals export quantities Camels Live animals export quantities Cattle Live animals export quantities Chickens Live animals export quantities ... Mules Live animals export quantities Pigeons, other birds Live animals export quantities Pigs Live animals export quantities Rabbits and hares Live animals export quantities Rodents, other Live animals export quantities Sheep Live animals export quantities Sheep and Goats Live animals export quantities Turkeys Live animals export quantities
Unit Head Head No Live animals export quantities Head Head Head Head Head 1000 Head Head ... Head 1000 Head Head Head 1000 Head 1000 Head Head Head 1000 Head Head
Area Year
Ghana 1975 NaN NaN NaN NaN NaN NaN NaN NaN 19.0 NaN ... NaN NaN NaN NaN NaN NaN NaN 0.0 NaN NaN
British Virgin Islands 1981 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN ... NaN NaN NaN 0.0 NaN NaN 0.0 0.0 NaN NaN
Thailand 1978 NaN NaN NaN 50523.0 22798.0 NaN NaN 27725.0 1734.0 NaN ... NaN NaN NaN 3114.0 NaN NaN 0.0 0.0 NaN NaN
Laos 1965 NaN NaN NaN 0.0 0.0 NaN NaN 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Belize 1998 NaN NaN NaN 0.0 NaN NaN NaN 0.0 0.0 NaN ... NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN

5 rows × 24 columns

Item Alfalfa meal and pellets Crops import quantities Almonds shelled Crops import quantities Animal Oil+Fat+Grs Crops import quantities Animal Vegetable Oil Crops import quantities Animal fats Crops import quantities Anise, badian, fennel, coriander Crops import quantities Apples Crops import quantities Apricots Crops import quantities Apricots, dry Crops import quantities Artichokes Crops import quantities ... Wheat+Flour,Wheat Equivalent Crops import quantities Whey, Pres+Concen Crops import quantities Whey, condensed Crops import quantities Whey, dry Crops import quantities Wine Crops import quantities Wine+Vermouth+Sim. Crops import quantities Wool, degreased Crops import quantities Wool, greasy Crops import quantities Wool, hair waste Crops import quantities Yoghurt, concentrated or not Crops import quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Denmark 1995 80.0 5096.0 10718.0 274049.0 10718.0 215.0 34277.0 466.0 655.0 153.0 ... 163069.0 7788.0 94.0 7694.0 129763.0 134206.0 3599.0 27.0 4.0 1474.0
Panama 1997 266.0 22.0 1779.0 31473.0 1779.0 18.0 5594.0 8.0 4.0 2.0 ... 93508.0 145.0 NaN 145.0 1192.0 1546.0 NaN NaN NaN 2.0
Slovenia 2011 2139.0 294.0 1371.0 66360.0 1371.0 501.0 18414.0 1226.0 226.0 16.0 ... 196652.0 210.0 NaN 210.0 9683.0 10129.0 17.0 1.0 34.0 11210.0
Saint Lucia 1980 NaN NaN 0.0 49.0 0.0 0.0 88.0 NaN NaN NaN ... 5939.0 NaN NaN NaN 207.0 207.0 NaN NaN NaN NaN
Estonia 1994 NaN 18.0 243.0 14731.0 243.0 29.0 5508.0 22.0 25.0 NaN ... 44221.0 75.0 NaN 75.0 2024.0 2386.0 60.0 1.0 NaN 3353.0

5 rows × 454 columns

Item Alfalfa meal and pellets Crops export quantities Almonds shelled Crops export quantities Animal Oil+Fat+Grs Crops export quantities Animal Vegetable Oil Crops export quantities Animal fats Crops export quantities Anise, badian, fennel, coriander Crops export quantities Apples Crops export quantities Apricots Crops export quantities Apricots, dry Crops export quantities Artichokes Crops export quantities ... Wheat+Flour,Wheat Equivalent Crops export quantities Whey, Pres+Concen Crops export quantities Whey, condensed Crops export quantities Whey, dry Crops export quantities Wine Crops export quantities Wine+Vermouth+Sim. Crops export quantities Wool, degreased Crops export quantities Wool, greasy Crops export quantities Wool, hair waste Crops export quantities Yoghurt, concentrated or not Crops export quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Argentina 2002 587.0 47.0 12836.0 4590624.0 12836.0 573.0 165944.0 154.0 334.0 0.0 ... 9312659.0 5233.0 NaN 5233.0 123646.0 136952.0 7757.0 15052.0 3366.0 513.0
Uruguay 2010 5940.0 0.0 63307.0 67690.0 63307.0 0.0 2403.0 0.0 0.0 0.0 ... 1424140.0 19312.0 NaN 19312.0 2848.0 2848.0 4423.0 12397.0 3869.0 0.0
Italy 1992 32257.0 1964.0 55134.0 567453.0 55134.0 290.0 409243.0 19483.0 135.0 10082.0 ... 2604167.0 32913.0 28527.0 4386.0 1136862.0 1246158.0 1249.0 4467.0 1656.0 167.0
Saint Pierre and Miquelon 2006 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
EU(15)ex.int 2004 118219.0 6951.0 308759.0 3079934.0 308759.0 3873.0 347031.0 14077.0 2373.0 941.0 ... 10659020.0 298171.0 5564.0 292607.0 1440763.0 1547996.0 21420.0 47480.0 8430.0 17071.0

5 rows × 445 columns

In [35]:
# Deal with the NaN that appeared
for df_name in df_useful :
    for column in list(df_useful[df_name]):
        if column not in ['Area', 'Year']:
            df_useful[df_name][column].fillna(df_useful[df_name][column].median(), inplace=True)

1.F.b. Merging everything

In [36]:
uni_df = df_useful['GDP']
for df_name in need_pivot :
    uni_df = pd.merge(uni_df, df_useful[df_name], how='outer', on=['Area', 'Year'])
uni_df = pd.merge(uni_df,df_useful['Consumer price indices'], how='outer', on=['Area', 'Year'])

# Deal with the NaN that appeared
for column in list(uni_df):
    if column not in ['Area', 'Year']:
        uni_df[column].fillna(uni_df[column].median(), inplace=True)
uni_df.sample(30)
Out[36]:
Area Year (GDP, million $) (Anise, badian, fennel, coriander Crops Area harvested, ha) (Apples Crops Area harvested, ha) (Apricots Crops Area harvested, ha) (Areca nuts Crops Area harvested, ha) (Artichokes Crops Area harvested, ha) (Asparagus Crops Area harvested, ha) (Avocados Crops Area harvested, ha) ... (Whey, Pres+Concen Crops export quantities, tonnes) (Whey, condensed Crops export quantities, tonnes) (Whey, dry Crops export quantities, tonnes) (Wine Crops export quantities, tonnes) (Wine+Vermouth+Sim. Crops export quantities, tonnes) (Wool, degreased Crops export quantities, tonnes) (Wool, greasy Crops export quantities, tonnes) (Wool, hair waste Crops export quantities, tonnes) (Yoghurt, concentrated or not Crops export quantities, tonnes) (Consumer price indices, %)
1561 Cambodia 2013 1.526870e+04 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 0.0 1.0 83.5 142.0 0.0 0.0 142.662578
2797 Dominican Republic 2002 2.565474e+04 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 3794.0 ... 5.0 0.0 8.0 23.0 23.0 83.5 142.0 38.0 0.0 36.486598
4025 Hungary 1988 3.198998e+04 2800.0 56000.0 7000.0 20235.0 1800.0 547.0 1250.0 ... 0.0 0.0 0.0 183362.0 214326.0 697.0 0.0 3939.0 0.0 97.895927
7052 Portugal 1979 2.662225e+04 1650.0 27543.0 1989.0 20235.0 1800.0 1404.0 1250.0 ... 17.0 0.0 17.0 140028.0 140283.0 307.0 0.0 107.0 0.0 97.895927
6932 Peru 1997 5.814388e+04 1650.0 12558.0 52.0 20235.0 235.0 16619.0 7246.0 ... 5.0 0.0 5.0 87.0 87.0 0.0 1776.0 607.0 0.0 97.895927
4086 Iceland 2003 1.130409e+04 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 0.0 0.0 0.0 62.0 62.0 620.0 525.0 0.0 3.0 62.528880
5057 Liberia 2008 9.200000e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 0.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 122.655132
8110 Solomon Islands 1979 1.301439e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 97.895927
8518 Swaziland 1973 2.741482e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 13.0 38.0 0.0 97.895927
4649 Kenya 2014 6.139544e+04 206.0 62.0 15.0 20235.0 6.0 22.0 11583.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 154.944454
2979 Equatorial Guinea 2000 1.510819e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 56.005175
6180 Nepal 1981 2.374580e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 0.0 38.0 0.0 97.895927
6592 Occupied Palestinian Territory 1979 8.461073e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 97.895927
6691 Pakistan 1986 4.049785e+04 1650.0 18491.0 5300.0 20235.0 1800.0 1404.0 1250.0 ... 0.0 0.0 0.0 44.0 46.0 9782.0 0.0 14.0 0.0 97.895927
10049 Zimbabwe 1986 8.323694e+03 180.0 650.0 12.0 20235.0 1800.0 1404.0 1250.0 ... 16.0 0.0 16.0 69.0 69.0 83.5 0.0 38.0 0.0 97.895927
8895 Togo 1982 8.207391e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 0.0 0.0 83.5 142.0 38.0 0.0 97.895927
8640 Switzerland 2003 3.519955e+05 1650.0 4410.0 471.0 20235.0 1800.0 1404.0 1250.0 ... 30802.0 0.0 30802.0 1382.0 1416.0 46.0 392.0 87.0 10642.0 98.016567
10288 Belize 1969 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 0.0 0.0 83.5 142.0 38.0 0.0 97.895927
13022 Netherlands 2017 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 107.125726
12645 EU(12)ex.int 1997 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 131010.0 168.0 130842.0 1373653.0 1434389.0 23830.0 17559.0 12919.0 25445.0 97.895927
2694 Djibouti 1991 4.662735e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 97.895927
10680 Equatorial Guinea 1965 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 97.895927
12854 China 2016 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 117.187070
12706 EU(25)ex.int 2002 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 251992.0 2899.0 249093.0 1181518.0 1226369.0 19761.0 25755.0 6821.0 13032.0 97.895927
1982 China 2015 1.115846e+07 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 114.889284
10950 Guam 1965 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 0.0 0.0 83.5 142.0 38.0 0.0 97.895927
12783 Afghanistan 2016 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 120.196509
5555 Marshall Islands 2000 1.109725e+02 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 97.895927
2387 Curaçao 2006 2.457582e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 86.834908
13070 Saint Kitts and Nevis 2017 6.406727e+03 1650.0 10960.0 3000.0 20235.0 1800.0 1404.0 1250.0 ... 5.0 0.0 8.0 44.0 46.0 83.5 142.0 38.0 0.0 100.916408

30 rows × 1387 columns

2.A.a Crops and livestock production and trade

TODO

2.A.b Introducing the concept of food self-sufficiency

In this section we will present and compute the notion of food self-sufficiency.

2.A.b.i Basic idea

One may wonder how to know whether a country produce all the food it needs or not. The notion of food-self-sufficency allows to answer to this question. More formally, it is a rate that decribes how much does a country can satisfy to meet its internal consumption needs by production. It describes the extent to which a country is able to feed its population through its domestic food production. We are interested into this measure since we think it could be correlated with the economic conditions of this country.

2.A.b.ii Formula and computation

In order to compute the food self-sufficiency, we will apply the following formula that gives us the food self-sudfficiency as a percentage :

$$\frac{Production \times 100}{Production + Imports – Exports}$$
In [37]:
all_columns = list(uni_df)
production_columns = []
import_columns = []
export_columns = []
for column in all_columns:
    if (type(column)==tuple) and column[1]=='tonnes':
        if 'export quantities' in column[0]:
            export_columns.append(column)
        elif 'import quantities' in column[0]:
            import_columns.append(column)
        elif 'Production' in column[0]:
            production_columns.append(column)
            
uni_df[('All productions','tonnes')] = 0
for column in production_columns :
    uni_df[('All productions','tonnes')] += uni_df[column]            
uni_df[('All imports','tonnes')] = 0
for column in import_columns :
    uni_df[('All imports','tonnes')] += uni_df[column]            
uni_df[('All exports','tonnes')] = 0
for column in export_columns :
    uni_df[('All exports','tonnes')] += uni_df[column]
            
uni_df[('food self-sufficiency','%')] = 100 * uni_df[('All productions','tonnes')] / (uni_df[('All productions','tonnes')]+uni_df[('All imports','tonnes')]+uni_df[('All exports','tonnes')])
In [38]:
display(uni_df[['Area','Year',('food self-sufficiency','%')]].sample(5))
Area Year (food self-sufficiency, %)
3632 Greece 2009 60.324336
10499 China, Taiwan Province of 1973 73.811271
12997 Martinique 2015 91.903985
10327 Botswana 1963 95.327686
6178 Nepal 1979 96.787584
In [39]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df[('food self-sufficiency','%')]
for year in range(1980, 2010, 5):
    display(year, visualise_world_data_folium(plot, year, False))
    
1980
1985
1990
1995
2000
2005

2.B. Consumer price indices

      1. what is it

      2. why do we care

TODO

2.C. Structure of international trade and historical context

Our dataset contains data for the historical period from 1970 to 2015. In order to be able to correctly interpret the results we are going to see, we first made an historical research on this period. We shortly listed below important events of this period for which we think they have had a significant influence on the agriculture and the economy.

There was the Cold war from 1945 to 1990 with two economic superpowers (USA and USSR). The USSR had been dissoluted in 1991. The Japanese economic miracle occured from 1945 to 1990 and allowed Japan to come out of the disastrous state in which it was at the exit of the WW2 and become world's second largest economy. There has been 2 big oil crisis in 1973 and 1979. There has been many wars (Middle East wars 1973-2000 e.g. Yom Kippur War 1973, Islamic Revolution in Iran 1979, Iran–Iraq war 1980-1988, Gulf war 1990-1991, Yugoslav wars 1991-2001...). We have already seen some consequences of such events by dealing with countries names in a previous section.

The third Agricultural Revolution (also known as Green revolution) occurs form 1960 to 1990 and imporved agricultural productions thanks to fertilizers and chemicals.

The following public-domain image from Wikimedia represents developed countries (blue), developing ones (orange) and least developed ones (red) according to the United Nations and International Monetary Fund. We expect to see similar results with our dataset (GDP).

The following image, also from Wikimedia shows the cumulative commercial balance for the period 1980-2008. We also expect to see similar results with our dataset, but there might be difference as we focus on agriculture.

2.D. Economic classification of countries

In [40]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df["(GDP, million $)"]
for year in range(1980, 2015, 5):
    display(year, visualise_world_data_folium(plot, year, True))
1980
1985
1990
1995
2000
2005
2010

4. Informed plan for next actions

TODO

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: